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Abstract 

In programming languages with dynamic use of memory, such as Java, knowing 
that a reference variable x points to an acyclic data structure is valuable for 
the analysis of termination and resource usage (e.g., execution time or memory 
consumption). For instance, this information guarantees that the depth of the 
data structure to which x points is greater than the depth of the data structure 
pointed to by x./ for any field / of x. This, in turn, allows bounding the 
number of iterations of a loop which traverses the structure by its depth, which is 
essential in order to prove the termination or infer the resource usage of the loop. 
The present paper provides an Abstract-interpretation-based formalization of 
a static analysis for inferring acyclicity, which works on the reduced product of 
two abstract domains: reachability, which models the property that the location 
pointed to by a variable w can be reached by dereferencing another variable v 
(in this case, v is said to reach w); and cyclicity, modeling the property that v 
can point to a cyclic data structure. The analysis is proven to be sound and 
optimal with respect to the chosen abstraction. 

Keywords: Abstract Interpretation; Acyclicity Analysis; Termination 
Analysis; Object-Oriented Programming; Heap Manipulation 



1. Introduction 



Programming languages with dynamic memory allocation, such as Java, al- 
low creating and manipulating cyclic data structures. The presence of cyclic 
data structures in the program memory (the heap) is a challenging issue in the 
context of termination analysis [7, 10, 1, 29], resource usage analysis [30, 13, 3], 
garbage collection [22], etc. As an example, consider the loop "while (x!=null 
) do x:=x.next;": if x points to an acyclic data structure before the loop, then 
the depth of the data structure to which x points strictly decreases after each 
iteration; therefore, the number of iterations is bounded by the initial depth of 
(the structure pointed to by) x. On the other hand, the possibility that x points 
to a cyclic data structure forbids, in general, proving that the loop terminates. 



Automatic inference of such information is typically done by (1) abstract- 
ing the loop to a numeric loop ^^while{x) <— {x>0,x>x'},while{x')"; and (2) 
bounding the number of iterations of the numeric loop. The numeric loop means 
that, if the loop entry is reached with x pointing to a data structure with depth 
X > 0, then it will eventually be reached again with x pointing to a structure 
with depth x' < x. The key point is that "x!=nuH" is abstracted to x > 0, 
meaning that the depth of a non-null variable cannot be 0; moreover, abstract- 
ing "x:=x.next" to x > x' means that the depth decreases when accessing fields. 
While the former is valid for any structure, the latter holds only if x is acyclic. 
Therefore, acyclicity information is essential in order to apply such abstractions. 

In mainstream programming languages with dynamic memory manipulation, 
data structures can only be modified by means of field updates. If, before x.f:=y, 
X and y are guaranteed to point to disjoint parts of the heap, then there is no 
possibility to create a cycle. On the other hand, if they are not disjoint, i.e., 
share a common part of the heap, then a cyclic structure might be created. This 
simple mechanism has been used in previous work [26] in order to declare x and 
y, among others, as (possibly) cyclic whenever they share before the update. 
In the following, we refer to this approach as the s/iarm^-based approach to 
acyclicity analysis. 

The sharing-based approach to acyclicity is simple and efficient, however, 
there can be an important loss of precision in typical programming patterns. 
E.g., consider "y:=x. next. next; x.next:=y;", which typically removes an element 
from a linked list, and let x be initially acyclic. After the first command, x and 
y clearly share, so that they should be declared as finally cyclic, even if, clearly, 
they are not. When considering x.f:=y, the precision of the acyclicity information 
can be improved if it is possible to know how x and y share. To this end, there 
are four possible scenarios: (1) x and y alias; (2) x reaches y; (3) y reaches x; (4) 
they both reach a common location. The field update x.f:=y might create a cycle 
only in cases (1) and (3). An acyclicity analysis based on similar observations 
has been considered before in the context of C programs [17], where the analysis 
has been presented as a data-flow analysis, however, no formal justification for 
its correctness has been provided. In what follows, we refer to this approach as 
the reachability -h&sed approach to acyclicity analysis. 

1.1. Contributions 

The main contribution of this paper is essentially theoretical. In particular, 
the paper formalizes an existing reachability-based acyclicity analysis [17] within 
the framework of Abstract Interpretation [11], and proves its soundness and 
optimality: 

1. We define an abstract domain I^^, which captures the reachability in- 
formation about program variables (i.e., whether there can be a path in 
the heap from the location i-^ bound to some variable v to the location i^, 
bound to some w), and the acyclicity of data structures (i.e., whether there 
can be a cyclic path starting from the location bound to some variable). 
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2. A provably sound and optimal abstract semantics C^|_](_) of a simple 
object-oriented language is developed, which works on X^^ and can of- 
ten guarantee the acyclicity of Directed Acyclic Graphs (DAGs), which 
most likely will be considered as cyclic if only sharing, not reachability, is 
taken into account. With respect to the original analysis, the definition 
of the abstract semantics involves additional effort like dealing with spe- 
cific features of object-oriented languages, and discussing some technical 
improvements . 

As a proof of concept, the abstract semantics has been also implemented in the 
COSTA [2] cost and Termination Analyzer as a component whose result is 
an essential information for proving the termination or inferring the resource 
usage of programs written in Java bytecode. Focusing on full Java bytecode, 
the implementation has also to deal with advanced features of the language like 
exceptions and static fields. 

The present paper is based on preliminary work by the same authors which 
was published as a short workshop version [15] and as a technical report [16]. 

1.2. Related work 

A reachability-based acyclicity analysis for C programs was developed in [17]; 
however, that analysis was presented as a data-flow analysis, and it did not 
include any formal justification of its correctness. Our paper provides a formal- 
ization of a similar analysis in terms of Abstract Interpretation, and includes 
soundness proofs. Note that [17] uses the terms "direction" and "interference", 
respectively, for reachability and sharing. 

As far as Abstract-Interpretation-formalized cyclicity analyses are concerned, 
the one by Rossignoli and Spoto [26] is the most related work. This analysis is 
only based on sharing (not on reachability), and, as discussed in the paper, is 
less precise than the reachability-based approach. 

The work on Shape Analysis [31] is related because it reasons about heap- 
manipulating programs in order to prove program properties. In most cases, 
safety properties are dealt with [6, 27, 25]. On the other hand, termination 
is a liveness property, and is typically the final property to be proven when 
analyzing acyclicity. Therefore, work on liveness properties will be considered 
more deeply. Most papers [24, 4, 7, 10, 9] use techniques based on like Model 
Checking [23], Predicate Abstraction [20], Separation Logic [24] or Cyclic proofs 
[9] to prove properties for programs which work on single-linked heaps. This 
means that only one heap cell is directly reachable from another one, which is 
basically the same as having, in an object-oriented language, only one class with 
one field. This somehow restricts the structure of the heap and, in some cases, 
allows obtaining more precise results. On the contrary, the present paper deals 
with a technique which does not rely on such an assumption: as the language is 
object-oriented, every object can have multiple fields. Other works [5] deal with 
single-parent heaps, which are multi-linked but sharing-free; needless to say, the 
present paper handles heap structures where sharing is more than a possibility. 
There also exist other works [19] based on Separation Logic which efficiently 
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prove program properties and deal with cyclic structures, but are specialized to 
a limited set of data structures like single- linked lists, double- linked lists or trees. 
Also, in most of these works, the heap size is bounded by some constant, which 
is also a minor limitation. On the contrary, the present paper deals with data 
structures which can have practically any shape, and tries to infer information 
about the shape on its own. It is convenient to point out that the acyclicity 
analysis under discussion does not focus on directly proving liveness properties; 
instead, it is supposed to provide useful information to a cost^ or termination 
analyzer which will perform the task. 

1.3. Organization 

The rest of the paper is organized as follows: Section 2 presents an exam- 
ple of reachability-based acyclicity analysis. Section. 3 defines the syntax and 
semantics of a simple Java-like language. Section 4 introduces the abstract do- 
mains for reachability and cyclicity, and their reduced product, and Section 5 
defines the abstract semantics, and proves some important properties. Finally, 
Section 6 concludes the paper. Proofs of the technical results are available 
in Appendix A. 

2. An example of reachability-based acyclicity analysis 

This section describes the essentials of the reachability-based acyclicity anal- 
ysis [17], and its advantages over the sharing-based one, by mean of an example. 
This example will also be used in the rest of the paper to illustrate the different 
technical parts of the analysis. 

Consider the program depicted in Figure 1. The class OrderedList implements 
an ordered linked list with two fields: head and lastlnserted point to, respectively, 
the first element of the list and the last element which has been inserted. The 
class Node implements a linked list in the standard way, with two fields value 
and next. Figure 2 shows a possible instance of OrderedList. The method insert 
adds a new element to the ordered list: it takes an integer i, creates a new node 
n for i (lines 9-10), looks for the position pos of n (lines 11-16), adds n to the 
list (lines 17-20), makes lastlnserted point to the new node (lines 22), and finally 
returns pos (lines 23). The goal is to infer that a call of the form "x.insert(i)" 
never makes x cyclic. This is important since, when such call is involved in a 
loop like following one 

1 x:=new OrderedList; 

2 while (j>0) do { i:=read(); x.insert(i); — 1; } 

if X cannot be proven to be acyclic after insert, then it must be assumed to 
be cyclic from the second iteration on. This, in turn, prevents from proving 
termination of the loop at lines 12-16, since it might be traversing a cycle. 



^This analysis is actually implemented in COSTA, which handles both cost and termination 
of the Java bytecode programming language 
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1 class OrderedList { 

2 Node head, lastlnserted; 

3 



4 int insert(int i) { 

5 Node c,p,n; 

6 int pos; 



7 
8 


pos:=0; 


// 
// 


/7 =( 


i 
t) 






9 


n:=new Node; 


// 


/g = ( 








10 


n.value:=i; 


// 


/lO = 









11 


c:=this.head; 


// 


111 = 


{thts-^c\ 




12 


while (c!=null c.value<i) 


do { 










13 


pos: = pos+l; 


// 


/l3 = 


{this- 


■^c, this-^ 




14 


p—c; 


// 


Il4 = 


{this- 


^c, this-^ 




15 


c:=c.next; 


// 


/l5 = 


{this- 


^c, this-^ 




16 


} 


// 


/l6 = 


{this- 


^c, this-^ 




17 


n.next:=c; 


// 


Il7 = 


{this- 


■^c, this-^ 


*p, p^c, n-^c} 


18 


if (p=null) 












19 


then this.head:=n; 


// 


/l9 = 


{this- 


■^c, this-^ 


tp, this-^n, p-^c, n-^c} 


20 


else p.next:=n; 


// 


I20 = 


I17 U {this~^n,p~^n} 


21 




// 


hi = 


Iio U I20 ~ I20 




22 


this.lastlnserted:=n; 


// 


I22 = 


hi 






23 


return pos+1; 


// 


-^23 = 


I22 







24 } 

25 } 
26 

27 class Node { 

28 Node next; 

29 int value; 

30 } 

Figure 1: The running example and the result of the analysis, put in comments. 

The challenge in this example is to prove that the instructions at lines 19 
and 20 do not make any data structure cyclic. This is not trivial since this, p, 
and n share between each other at line 17; depending on how they share, the 
corresponding data structures might become cyclic or remain acyclic. Consider 
line 20: if there is a path (of length or more) from n to p, then the data 
structures bound to them become cyclic, while they remain acyclic in any other 
case. The present analysis is able to infer that n and p share before line 20, but n 
does not reach p, which, in turn, guarantees that no data structure ever becomes 
cyclic. It can be noted that reachability information is essential for proving 
acyclicity, since the mere information that p and n share, without knowing how 
they do, requires to consider them as possibly cyclic, as done, for example, by 
RossignoH and Spoto [26]. 
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X d 


f ==a^ lastlnserted 




head 





Figure 2: A graphical representation of the data structure on which the example works. 

3. A simple object-oriented language 

This section defines the syntax and the denotational semantics of a simplified 
version of Java. Class, method, field, and variable names are taken from a set X 
of valid identifiers. A program consists of a set of classes JC C X ordered by the 
subclass relation ^. Following Java, a class declaration takes the form "class 
Ki [extends K2] { ti /i;. . . tn fn, M\ ... Mfe}" where each Hi fi" declares the field 
fi to have type ti G fCU {int}, and each is a method definition. Similarly to 
Java, the optional statement "extends K2" declares ki to be a subclass of K2- A 

method definition takes the form "i m(ti wi tn w„) {t„+i Wn+i;. ■ .tn+p Wn+p-, 

com}" where: t G JCU {int} is the type of the return value; wi, . . . ,Wn & X are 
the formal parameters; Wn+i, . • • , Wn+p £ are local variables; and com is a 
sequence of instructions according to the following grammar: 

exp ::= n \ null | v \ v.f \ expi ® exp2 \ new k \ v.m{v) 
com ::= v:=exp \ v.f:=exp \ comi;com2 \ 

if exp then comi else comg | while exp do com \ return exp 

where v,v,m, f G X; n G Z; k G IC; and ® is a binary operator (Boolean 
operators return 1 for true and for false). For simplicity, and without loss 
of generality, conditions in if and while statements are assumed not to create 
objects or call methods. A method signature K.m{t\, . . .,tn):t refers to a method 
m defined in class k, taking n parameters of type ii, . . . ,i„ G /C U {int}, and 
returning a value of type t. Given a method signature m, let m'' be its code 
com; m* its set of input variables {this, w\, . . . ,Wn}; its set of local variables 
{«;„+!,. . .,«;„+„}; and = m* U m'. 

A type environment t is a partial map from A" to /C U {int} which associates 
types to variables at a given program point. Abusing notation, when the context 
is clear, type environments will be confused with sets of variables; i.e., the partial 
map will be confused with its domain when the type of variables can be ignored. 
A state over r is a pair consisting of a frame and a heap. A heap is a partial 
mapping from an infinite and totally ordered set C of memory locations to 
objects; n{t) is the object bound to ^ e £ in the heap ji. An object o e C is a 
pair consisting of a class tag o.tag G IC, and a frame o.frm which maps its fields 
into V = ZU£U{null}. Shorthand is used: o.f for o.frm(/); fi[£ i—> o] to modify 
the heap /i such that a new location £ points to object o; and iJ,[i.f 1— > v] to 
modify the value of the field / of the object ^{£) to v G V. A frame (j) maps 
variables in dom(r) to V. For v G dom(T), (j){v) refers to the value of v, and 
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{(j[p H- > null], d) 

{(j[p I— > I], I— >■ newobj (k)]) where £ — newloc{a) 

{a[p^ a{a{v)).f],a} 
(o-[p ai(p) © (T2(p)], (T2) where 

(Ti = E'rlexp-ij{a) and 0-2 = S^ijexp J ({ct, cti)) 

(T2 (ottt)], 0-2) where 
(72 — t(m)(cri) and ai is such that 

1. ai = a- 

2. ai{this) = i5-(i'o); 

3. Vl<i<n. (Ti(wi) = o"(i'i); and 
4. m = lkp{a, vo.m{vi, . . . ,Vn)); 

= {a[v (Te(p)], (Te) 

= {cr,a[£.f H-)- (3"e(p)]) where £ = 

=if (5"e(p) 7^ then C^|[comi](cr) else C^|[com2](a) 

=5(a) where 5 is the least fixpoint of 
Aw. Act. if (Te(p) 7^ then w{C!rlcoml{a)) else a 

=C;icom2](C^iIcomi](a)) 
Figure 3: Denotations for expressions and commands. The state is E^lexpl{a). 

(j)[v 1-^ v] is the frame where the value of v has been set to v, or defined to be v 
if u ^ doni(0). The set of possible states over t is 

f . (/) is a frame over r, /i is a heap, and both are well- typed "] 

2. rng((/)) n £ C dom(^) > 

3. y£ e dom(/i). rng(^(^).frm) n C C dom(^) J 

Given a £ St, o' and ct refer to its frame and its heap, respectively. The complete 
lattice — (p(Ei-), 2]t-, 0, n, U) defines the concrete computation domain. 

A denotation S over two type environments ri and T2 is a partial map from 
to S7-2 • it basically describes how the state changes when a piece of code is 
executed. The set of denotations from ti to T2 is A{ti,T2). Interpretations are 
special denotations which give a meaning to methods in terms of their input and 
output variables. An interpretation t e F maps methods to denotations, and is 
such that t(m) e A(m*, {ozti}) for each signature m in the program. Note that 
the variable out is a special variable which will be used to denote the return 
value of a method. 

Denotations for expressions and commands are depicted in Figure 3. An 
expression denotation i?^ |ea;p] maps states from to states from E^-y , where 
p is a special variable for storing the expression value. A command denotation 
C^|com] maps states to states, in presence of i G F. The function newobj{K) 
creates a new instance of the class k with integer fields initialized to and 



[[null] (cr 

E'^lnew k]((t 



E'^lvo.m(vi, ...,v„)l{a) = 



C!^lv.f: = expj{a 
if exp then comi 
else com2 



Ct [[while exp do com] (a 

C.^ [[return exp}{a 
di-lcomi; com2]((T 
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reference fields initialized to null, while newloc{a) returns the first free location, 
i.e., the first i ^ dom(a) according to the total ordering on locations. The 
function Ikp resolves the method call and returns the signature of the method 
to be called. The concrete denotational semantics of a program is defined as 
the least fixpoint of the following transformer of interpretations [8] . 

Definition 3.1. The denotational semantics of a program P is the least fixpoint 
(lip ) of the following operator: 

Tp{l) = {m Act G Y.^^3T\out.C'^syj{^^t,^lm^}{extend{a,m)) | m e P} 
where extend{a, m ) = (ct[Vu e m' U {out} .V H> 0/null], ct). 

The denotation for a method signature m G P is computed by the above operator 
as follows: (1) it extends (using extend{a, m)) the input state ct G such that 
local variables are set to or null, depending on their type; (2) it computes the 
denotation of the code of m (using C'msu{o„f} ); and (3) it restricts the 
resulting denotation to the output variable out (using 3T\out). 

4. The abstract domain 

The acyclicity analysis discussed in this paper works on the reduced prod- 
uct [12] of two abstract domains, according to the theory of Abstract Interpre- 
tation [11]. The first domain captures may-reachability, while the second deals 
with the may-he-cyclic property of variables. Both are based on the notion of 
reachable heap locations, i.e., the part of the heap which can be reached from a 
location by accessing object fields. 

Definition 4.1 (reachable heap locations [26]). Given a heap fi, the set of 

reachable locations from £ G dom(/i) is R{^,i) — U{P*(/i,^) | i > 0}, where 
R°{n,i) = rng(^(/).frm) n C, and R'+\n,i) = U{rng(^(f ).frm) n £ | f G 
R^p,i)}. The set of e -reachable locations from £ G dom(/i) is R'^{iJ,,£) = 
R{ti,£)U{£}. 

Note that e-reachable locations include the source location £ itself, while reach- 
able locations do not (unless £ is reachable from itself through a cycle whose 
length is at least 1). The rest of this section is developed in the context of a 
given type environment r. 

Reachability 

Given a state ct G St, a reference variable w G r is said to reach a reference 
variable w G t in ct if a{w) G P(ct,ct(w)). This means that, starting from v 
and applying at least one dereference operation (i.e., going from the location 
pointed to by v to the location pointed to by v.f for some field /), it is possible 
to reach the object to which w points. Due to strong typing, r puts some 
restrictions on reachability; i.e., it might be impossible to have a heap where 
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a variable of type ki reaches one of type H2- Following Secci and Spoto [28], 
a class K2 G ^ is said to be reachable from ki G /C if there exist cr € S^, and 
two locations i^l' G doin(i7) such that (a) a{€}X3g = ki; (b) a-(£').tag = K2; 
and (c) £' G R(a,£). The use of this notion (as well as the notion of cyclic 
class introduced in Section 4.2 and used in Definition 4.5) in the definition 
of the reachability and cyclicity domains allows us to obtain the needed Galois 
insertions. It must be pointed out that both notions can be computed statically, 
so that they can be assumed to be pre-computed information. 

Definition 4.2 (reachability domain). The reachability abstract domain is 
the complete lattice 2^ — (p(7?.^), C, 0, T?,'^, n, U), where 

v,w ^ dom(r), and there exist ki^t(u) and K2d:T{w) 1 
such that K2 is reachable from ki J 

Here and in the following, elements of the tuple (A, <, _L, T, A, V) denoting an 
abstract domain A represent, respectively, (A) the set of abstract values, (<) the 
partial order on them, (]-) the minimal (bottom) element of A, (T ) the maximal 
(top) element of A, (A) the meet operator and (W) the join operator on A. This 
terminology is standard in Abstract Interpretation. 

May-reach information is described by abstract values Ir £ p{TV). For example, 
{x^z, y^z} describes those states where x and y may reach z. Note that a 
statement x-^y does not prevent x and y from aliasing; instead, x can reach y 
and alias with it at the same time, e.g., when x, y, and x.f point to the same 
location. 

Lemma 4.3. The following abstraction and concretization functions define a 
Galois insertion between and : 

a^(/t) — {v-^w e TV I 3(7 G I\,.v reaches w in a] 

Iri-^r) — {c e St- I \/v,w ^T. V rcachcs w in a ^ v-^w G Ir} 

The top element TV is aJ,{'Er), and represents all states which are compatible 
with r. This is because the presence of a reachability statement in an abstract 
value / does not require a reachability path to actually exist; rather, the con- 
cretization of / will include states where the path does exist, and states where 
it does not (this is the meaning of "may-information"). In other words, the 
absence of a reachability statement in the abstract state requires non-existence 
of a reachability path in its concretization. 

The bottom element models the set of all states where, for every two 
reference variables v and w (possibly the same variable), v does not reach w. 
Note that, clearly, this set is not empty, and that the absence of a reachability 
statement actually rules out states where the reachability path exists. 

Remark 4.4. Intuitively, reachability is a transitive property; i.e., ifx reaches 
y and y reaches z, then x also reaches z. However, values in are not closed by 
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transitivity: e.g., it is possible to have Ir = {x~^y,y~~^z} which contains x^y and 
y~^z, but notx'-^z. Such an abstract value is a reasonable one, and approximates, 
for example, the execution of the following code 

1 x:=new C; 

2 y:=new C; 

3 if (w>0) then x.f:=y; else y.f:=z; 

Moreover, this abstract value is consistent, i.e., it describes a set of concrete 
states which is not smaller (actually, it is greater) than 7^(0)- This happens 
because reachability is, actually, may-reach information, so that, for example, 
7^({x-wy, y^z}) includes (a) any state where x reaches y but y does not reach 
z; (b) any state where y reaches z but x does not reach y; and (c) any state 
where x does not reach y and y does not reach z. It is important to point out 
that 7^({x^y, y^z}) does not contain those states where both x reaches y and 
y reaches z, since, in this case, x would also reach z by transitivity, which is 
forbidden by soundness since x-wz ^ 7^. 

4.2. Cyclicity 

Given a state ct £ E,-, a variable v G dom(T) is said to be cyclic in a if 
there exists ^ G W{a,a{v)) such that i e R{a,£). In other words, v is cyclic if 
it reaches some memory location £ (which can possibly be a{v) itself) through 
which a cyclic path goes. Similarly to reachability, it might be impossible to 
generate a cyclic data structure starting from a variable of some type k. A class 
K € /C is said to be a cyclic class if there exist cr € Sr and £, £' € dom(o') such 
that (j(£).tag = n, £' G R'{d,£), and £' G R{a,£'). The cyclicity domain is the 
dual of the non-cyclicity domain by Rossignoli and Spoto [26] . 

Definition 4.5 (cyclicity domain). The abstract domain for cyclicity is rep- 
resented as the complete lattice 2J = (p(3^^)i ^7 0: 3^^j H, U) where 

= {O" \ V £ T, and there exists a cyclic class k ^ t{v)} 

Lemma 4.6. The following abstraction and concretization functions define a 
Galois insertion between IJ and 

ot^c{h) = {Cf I 3w G T. 3(T S V is cyclic in a} 

Ici^c) = {a \ a e T,r A'iv e T. {v is cyclic in a) O'" G Ic} 

May-be-cyclic information is described by abstract values Ic G p(!V^). For in- 
stance, {O^} represents states where no variable but x can be cyclic. The top 
element is concretized to S,-; i.e., all state are included since each variable 
can be either cyclic or acyclic. The bottom element does not allow any vari- 
able to be cyclic, i.e., its concretization does not include any state with cyclic 
variables. 
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4-3. The reduced product 

As it will be explained in Section 5, the abstract semantics uses reachabil- 
ity information in order to detect cycles, and cyclicity information in order to 
add, in some cases, reachability statements. Both kinds of information can be 
combined: in the theory of Abstract Interpretation, this amounts to computing 
the reduced product [12] of the corresponding abstract domains. In the present 
context, the reduced product is obtained by reducing the Cartesian product 
= X IJ. Elements of I^^ are pairs (/r,/c), where Ir and Ic contain, 
respectively, the may-reach and the may-be-cyclic information. The abstraction 
and concretization functions are induced by those on IJ and I^: 

YrA{Ir,Ic)) = ll{Ir)r\Yc{Ic) = «(/),<(/)) 

However, it can happen that two elements of are mapped to the same set 
of concrete elements, which prevents having a Galois insertion between I^^ and 
. The operation of reduction deals exactly with this problem. In order to 
compute it, an equivalence relation = has to be defined, which satisfies I^^ = I^^ 
is and only if Jrd^rc) = Ird^rc)- Functions and a^^ define a Galois insertion 
between 2^c= ^^'^ ^iTj where 1^0= is Ire equipped (reduced) with the equivalence 
relation. The following lemma characterizes the equivalence relation on I^^. 

Lemma 4.7. For any abstract values 1^,1^ G 1^ and Il-^I^ £ 1-1, the con- 
cretization Jrcii^rT^c)) squal to 7^^((/^,/^)) if and only if both conditions 
hold: (a) ll = I^; and (b) 1} \ {v^v \ i ll] = \ {v^v \ O" ^ I^}. 

This above lemma means that: (a) may-be-cyclic information always makes a 
difi^erence as regards the set of concrete states; that is, adding a new statement 
C" to Ire G Ire results in representing a strictly larger set of states; and (b) 
adding a pair v-^v to Ire G Irc^ when v cannot be cyclic, does not make it 
represent more concrete states, since the acyclicity of v excludes that it can 
reach itself. 

Example 4.8. As an example for case (a), consider two abstract values Ire — 
{Ir, 0) and I^e — {Ir,{C)^}) which result from adding to l}^. Assuming that x 
does not appear in Ir, there is a state a which is compatible with Ir (for example, 
if no V reaches any w in a), and where x is cyclic ( note that this does not require 
X to reach any other variable, not even itself, since the cycle does not need to go 
through (t(x)J. This a belongs to ^rc{-^re)\lre{-^re) ^'"^'^ therefore, an example 
of the difference between the abstract values. 

As an example for (b), consider l}^ — {%,{0'^}) and I^^ = ({x^x},{0^}) 
which results from adding x~-+x to Ire- first glance, 1^^ describes a larger 

set of states, since it includes states (not belonging to Jrei^rc)) where there is 
a path from x to x. However, such states will neither belong to 7^^(/^^), since 
such a path implies that x is cyclic, which is not permitted by {O^}, that only 
allows y to be cyclic. 

Lemma 4.7 provides a way for computing the normal form of any (Ir, le), which 
comes to be {Ir \ {v-^v\(y ^Ic\ , Ic) , i-c, the canonical form of its equivalence 
class. From now on, T^^ '^iU be a shorthand for Trc=y where = is left implicit. 
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5. Reachability-based acyclicity analysis 

This section uses T^^ to define an abstract semantics from wlriclr one can 
decide wlretlier a variable v is (or may not be) bounded to an acyclic data 
structure at a given program point. Informally, two variables v and w are said 
to share in a state a if they e-reach (i.e., in zero or more steps) a common 
location in the heap. The analysis is based on the observation that reachability 
information can tell how v and w share: this can happen because either (a) v 
and w alias; (b) v reaches w; (c) w reaches w; or (d) they both reach £ S dom(d'). 
Distinguishing among these four possibilities is crucial for a precise acyclicity 
analysis. In fact, assuming that v and w are initially acyclic, they both become 
cyclic after executing «.f:=w if and only if, initially, w either reaches v or aliases 
with it. This is clearly more precise than declaring v as cyclic whenever it 
was sharing with w [26]. The presented analysis is an adaptation of the work 
by Ghiya and Hendren [17] to an object-oriented framework, where the chosen 
formalism is that of an abstract semantics on the domain described in Section 
4. Some optimizations w.r.t. the original analysis are also discussed. 

The rest of this section formalizes the reachability-based analysis as an ab- 
stract semantics on I^^, and proves some important results. 

5.1. Preliminaries 

May-share [28], may-alias [21] and purity [14] analyses are used as pre- 
existent components, i.e., programs are assumed to have been analyzed w.r.t. 
these properties by means of state-of-the-art tools^. Two reference variables 
V and w share in a iff R^{a,a{v)) fl R'^{a,a{'w)) ^ 0; also, they alias in a if 
they point to the same location, namely, if a{v) = a(w) £ dom(a). Any non- 
null reference variable shares and aliases with itself; also, both are symmetric 
relations. The i-th. argument of a method m is said to be pure if m does not 
update the data structure to which the argument initially points. For sharing 
and purity, the analysis proposed by Genaim and Spoto [14] (based on previous 
work by Secci and Spoto [28]) can be applied: with it, 

1. it is possible to know if v may share with w at any program point (denoted 
by the sharing statement (vw)); and 

2. for each method m, a denotation SPm is given: for a set of pairs Isp which 
safely describes the sharing between actual arguments in the input state, 
Jgp = SPm(/sp) is such that (i) if (vw) G I'^p, then v and w might share 
during the execution of m; and (ii) Vi G I'^p means that the i-th argument 
might be non-pure. 



■^One could argue that aliasing and sharing analyses benefit from reachability information, 
so that all the components should better work "in parallel"; however, for the sake of this 
presentation, the three components (sharing, aliasing, and reachability-cyclicity) are supposed 
to be independent. See Section 5.4 for further discussion about the interplay between all the 
analyses. 
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According to the theory of Abstract Interpretation and to previous work, sharing 
and purity analysis can be defined as an abstract semantics over the abstract 
domain ZJ^, whose elements Igp contain may-share statements (vw) and may- 
be-non-pure statements i). Abstraction and concretization functions a^p and 
7sp are defined in the standard way [14]: in particular, jsp{Isp) contains all the 
states where variables mentioned in sharing statements are the only ones which 
can possibly share between themselves, while variables mentioned in may-be- 
non-pure statements are the only ones which can possible be non-pure. 

As for aliasing, the abstract domain X^^ contains sets of may-alias statements 
{v-w): if (v-w) is contained in lai, then its concretization ^ai{Iai) contains states 
where v and w actually alias and states where they do not. It is assumed 
that this information is available at each program point as a set of may-alias 
statements. 

In the following, the domain IJ will be the reduced product between IJ^ 
and I^;, and combines sharing, aliasing, and purity information. As usual, 
7s((/sp, lai)) is defined as ')ap[Isp)^lai{Iai), while as{X) is defined as {ash{X)\J 
aai{X))=, where = means that abstract elements with the same concretization 
have been unified (i.e., the product has been reduced). 

Abusing notation, from now on Is will be often used to denote an abstract 
value without specifying the abstract domain it belongs to. The use of Ig will 
be clear from the context: for example, writing jaiih) means applying jai to 
the part of Ig which represents aliasing information. 

Moreover, an abstract element {Ir,Ic) S I^c will be represented by the set 
I = Ir a Ic'i therefore, v^w G / and O" G / are shorthands for, respectively, 
v-^w e Ir and O" G Ic- The operation 3v.I (projection) removes any statement 
about V from J, while I[v/w] (renaming) v to w in /. For the sake of sim- 
plicity, class-reachability and class-cyclicity are taken into account implicitly: a 
new statement v-^w is not added to an abstract state if v-^w ^ , while a 
statement is not added if 0" ^ ■ It is important to point out that infor- 
mation about class- reachability and class-cyclicity (i.e., whether ki reaches K2, 
or whether k is cyclic) can be computed statically and before performing any 
acyclicity analysis. Therefore, it can be assumed that such information is avail- 
able whenever it is necessary to decide whether a new reachability or cyclicity 
statement belongs or not to TZ'^ or . 

5.2. The abstract semantics 

An abstract denotation ^ from ri to T2 is a partial map from I^^ to . It 
describes how the abstract input state changes when a piece of code is executed. 
The set of all abstract denotations from ti to T2 is denoted by S(ri,T2). As in 
the concrete setting, interpretations provide abstract denotations for methods in 
terms of their input and output arguments. An interpretation C maps methods 
to abstract denotations, and is such that C('ti) G S(m*, U {out}). Note that 
the range of such denotations is m* U {out}, instead of {out} (as in the concrete 
semantics): this point will get clarified below. Finally, denotes the set of all 
(abstract) interpretations. 
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(le) £cHW= I 

(3e) f^ltiew = / 

(4e) £cMi^)= if r(u)=intthenIelse7U/[«/p] 

(5e) f^Jf ./](/) = if / has type int then I else / U 7' where 

J'=/[u/p] U {w^p\{wv}Gls} U {/9->p I e /} 
(6.) £llexp,®exp,UI)^ Jp.S^lexp^U^p.Sllexp.UI)) 
{7e) £^lvo.7n{vi,-;yn)}{I) = I U U h U h where 

Io=3{r\v).I 

/m = U { (C(m)(/o[t)/m']))[mVt', out/p] | m might be called here" } 
I'g = {{vi»Vj) I Vi,Vj £ u and {vi»Vj} G Js} U G u and ?) G Is} 
I'J = U{SPm(/a[t'/m'])[mYi', oiit/p] | m might be called here} 
elm) A {hel'J) A ((wi.u,)g70a 
((uj- *m2G/) V (w2-Vi)G/0} 

72 = {wi-^W2 \ {{vi»Vj) € I's) A (wi G 7") A {{v^»wi) G 7^) A {vj-^W2 G 7)} 

73 = U{(7iU72)[t;/p] I {v-p) after the call } 

h = {O" I {{wv) G 70 A (i- G 7^') A (O" G 7„)} 



"See Section 5.2.5 

Figure 4: Abstract denotations for expressions 



Figures 4 and 5 depict abstract denotations. An expression denotation 
£'^\exp\ maps abstract states from T^^ to abstract states from where ti = 
T U {p}, while a command denotation C'^\com\ maps Z^^ to Z^^. 

In the definition, the abstract element 1^ contains the sharing, aliasing, and 
purity information pre-computed by other analyses, and referring to the program 
point of interest'^ . 

5.2.1. Expressions 

An expression denotation £^|ea;p] adds to an input state / those reachability 
and cyclicity statements which result from evaluating exp. 

Nothing is added to / in cases (le): (2e): f^|null], and (3e): f^|new k] 

since the expression is evaluated without side effects to, respectively, an integer 
value, null, or a newly allocated object which is not related to any other location 
with respect to reachability. 

The same reasoning explains why the returned abstract value is also / in 
case (4e): when T(w)=lnt, and (5e): when / is an Int field. 

In case (4e): when the type of v is not Int, the result variable p has 

the same abstract behavior as v. Therefore, the semantics returns /, together 
with a cloned version I[v/p\ where statements about v have been replaced by 
renamed statements about p. 



■^Note that Is could be represented explicitly as an input to the abstract semantics, next 
to /, but it is not written for better clarity 
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(Ic) 

(2c) 



(3c) q 

(4c) 
(5c) 
(6c) 



Cllv: = expm = {3v.£llexpm)[ph] 
Cllv.f--exp\{I) = 3p.(J' U U Ic) where 
Ellexpm 
condRemove(lQ, v, f) 
{wi'^W2 I (((wi-«)G/s) V (toi-*uG/')) a 
(((p-W2>G/0v(p-^W2G/'))} 
{O"' I ((p^« G /') V ((p-«>G/0 V (O^G/')) A 
(((i(;-w)GJn V (ii;->uG/'))> 



J' 

Ir 



if exp then comi 
^ else com2 j, 

CJJwhile exp do corre](/) 
[return e3:;p](/) 
C^lcomi; com2}{I) 



v}eO V (lu-^uG/'))} 
(/)= q[comi](/) U CJ[com2](J) 



C(/) where C = ;/p(AM.A/.i«(q|[com](J))) 

£aea^Pl(^)[p/o«t] 
qicom^UCllcom^il)) 



Figure 5: Abstract denotations for commands 



In the case of (5e): when / is a reference field, the following infor- 

mation is added to /: 

• statements for v which are cloned for p; 

• w-^p, if w might share with v; note that v-^p is always added since 
(vv) G Is (clearly, v cannot be null); if v and w reach a common location 
(which implies that they share), but do not reach each other, then, con- 
servatively, the reachability statement w^p must be added because v.f 
could be exactly the common location which is reached by both v and w; 

• if u might be cyclic, then, for soundness, p-^p; note that, in this case, C 
is also guaranteed to have been previously added to the abstract state. 

In case (6e): S^fexpi © eipjli the expression expi is first analyzed, then 
exp2 is analyzed on the resulting abstract state. Note that, in both cases, p is 
removed since the return value has always type int. 

Finally, method calls (7e): £^lvo.'m{vi, ..^Vn)l will be explained later, after 
introducing denotations for commands. 

Example 5.1. Consider c:=c.next at line 15 in Figure I. Evaluating the de- 
notation f|J|c.next](/i4) results in {this-^c,this-^p,this-^p,c-^p,p-^p}. The 
statement this-^p is added since this-^c £ In; c-^p andp^p are added because 
{c»c) and {c»p) hold after line 14-- 

5.2.2. Variable assignment 

The denotation (Ic): C'^lv:=exp} computes £^lexp}{I), removes any state- 
ment about V since it takes a new value, and finally renames p to v. Note that 
it is safe to remove statements about v since it is first cloned to p. 
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Example 5.2. Consider, again, line 15 in Figure 1. Evaluating the denotation 
C^|c:=c.next](/i4) first computes f^|c.next](Ji4) as in Example 5.1. Then, state- 
ments involving c are removed, which results in {this~^p,this-^p,p^p}, and, 
finally, p is renamed to c, giving {this-^p,this~^c,p-^c}. Note that this-^c is 
reinserted (by renaming this-^p) after being deleted by 3c. Also, note that c^p 
has been removed by 3c, so that, correctly, c is not considered to reach itself 
after the assignment. 

5.2.3. Field update 

The denotation (2c): C^^v.f-.—expJ accounts for field updates. The set I'q 
resuhs from computing f^|ea;p](/), as usuaL The fohowing step is to apply an 
optimization (called the single-field optimization in the following) which allows 
removing statements after inspecting the declarations of the classes involved 
in the update. The abstract value /' is computed from /q by the function 
condRemove{lQ,v, f), which is defined as follows. Let k be the declared class 
of V (this means that the runtime type of v can be n or any of its subclasses) ; 
then /' is obtained by /q by 

• removing O'" if (1) / is the only reference field of any k' ^ k; or (2) all the 
other reference fields of any k' < n have a declared class such that neither 
it nor any of its subclasses are a cyclic class; 

• similarly, removing any statement v-^w such that / is the only field of 
any k' ^ k whose declared class k/ (or any of its subclasses) reaches the 
declared class of w (or any of its subclasses) ; 

• leaving all the statements in /q if these conditions do not hold. 

Basically, this single-field optimization identifies cases where the only cycles or 
reachability paths starting from v must forcefully traverse /, either because / is 
the only field, or because no other field makes such cycles or paths possible. It 
must be pointed out that this optimization relies on information about classes 
and fields which can be obtained statically by code inspection, and was not 
included in the original analysis of Ghiya and Hendren [17]. The sets Ir and 
Ic capture the effect of executing v.f:=p on /'. Moreover, contains sharing, 
purity and aliasing information after evaluating exp. 

The following reachability statements are added: for any wi which might 
either alias with v or reach v (formalized as {{wi-v) € /^) V {wi-^v G /')), 
and any W2 aliasing with p or reachable from it (formalized as {{p-W2) G Ig) V 
{p-^W2 G I')): the statement wi-^W2 is added since the new path created by 
the update implies that wi can reach W2. This accounts for all possible paths 
which can be created by adding a direct link from v io p through /. 

New cyclicity statements are contained in Ic. There are three possible sce- 
narios where v might become cyclic: 

• p reaches v, so that a cycle from v to itself is created; 

• p aliases with v, so that v reaches itself with a path of length 1 (e.g., the 
command y.f:=y); or 
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• p is cyclic, so that v becomes indirectly cyclic. 

Whenever one of these scenarios occurs (formalized as {p-^v e /') V {{p-v) G 
^s) V (O'' G /')), any variable w aliasing with v or reaching it (formalized as 
{(w-v) e /^) V {w-^v e /')) has to be considered as possibly cyclic. 

Example 5.3. Consider line 20 in Figure 1. The abstract value before such a 
line, produced at line 17, is In = {this-~~>c, this-^p,p-^c,n-^c}. The evaluation 
o/C^|p.next:=n](/i7) at line 20 adds a new statement p^n, as expected. More- 
over, it also adds this~^n since this was reaching p, and both p-^c and this-^c 
(which, however, were already contained in In) since n was reaching c. 

5.2.4- Conditions, loops, composition, and return command 

Rules (3c): C^l'f exp then comi else com2], (4c): C^Jwhile exp do com], and 
(6c): C^|comi; com2] are quite straightforward and correspond, respectively, 
to the if conditional, the while loop, and command composition. Finally, 
rule (5c): [return expj corresponds to the return command, and behaves, as 
expected, like the execution of out:— exp. 

5.2.5. Method calls 

Rule (7e): £^lvo.m{vi, ..,Vn)} propagates the effect of a method call to the 
calling context, as follows: 

1. the abstract state / is projected on the actual parameters v, thus obtaining 
Iq ; this is needed since the denotation of the callee is given in terms of its 
parameters; 

2. the denotation of each method m which can be possibly called at runtime 
is taken from the current interpretation, namely, C(rn)j ^-^d applied to 
/o[w/m*], which is the result of renaming the actual parameters v to the 
formal parameters in /q; 

3. formal parameters are renamed back to the actual parameters (plus out 
and p) in the resulting state ({m){Io[v/m^]), and the states obtained from 
all possible signatures are merged into /,„. 

Step 2 takes more than one method into account because, in an Object- 
Oriented language with inheritance, it is in general not possible to decide, 
at compile-time, which method instance (among various method declarations 
whose signature is compatible with the type of the actual parameters and the 
expected return value) will be actually invoked after calling the function Ikp 
(Section 3). Therefore, the abstract semantics takes, conservatively, the union 
of all of them. 

In the definition, is a safe approximation of the sharing among actual 
parameters, and /" safely approximates the sharing and purity information 
after the method call. The definitions of /i, I2, I3, and £4 account for the 
propagation of the effects of the method execution in the calling context: 



17 



1 Node f(Node a, Node b.Node c) { 

2 a.next:=b; 

3 c.next:=this; 

4 return b.g(c); 

^ } 



1 Node g(Node y) { 

2 this.next:=y; 

3 return this; 

- } 



1 Node h(Node y) { 

2 this.next:=y; 

3 y:=null; 

4 return this; 

^ } 



2 u:=y; 

3 this.next:=y; 

4 y:=null; 

5 return this; 

« } 



1 Node k(Node y) { 



Figure 6: Some more examples 



• Ii states that, if the call creates reachability from Vi to vj, then any wi 
sharing with Vi before the call might reach any W2 which is reachable from 
Vj or aliasing with Vj . Note that adding these statements is necessary only 
if Vi is updated in the body of some m (this information is conservatively 
represented in so that the condition Vi e /" must be checked): other- 
wise, it is guaranteed that no path from wi to W2 will be created during 
the call. 

• I2 states that, if the call makes Vi share with Vj, then any wi sharing with 
Vi might reach any W2 reachable from Vj. Again, this is required only if 
Vi is updated in the body of any m. 

• I3 contains the information about any variable v aliasing with p, which is 
cloned for p. 

• I4 will include the possible cyclicity of anything sharing with an argument 
which might become cyclic. 

The final result of processing a method call is the union / U /m U /s U 14. 

Example 5.4. Consider methods f and g of Figure 6, and assume that both 
are defined in the class Node. Let ^ be a denotation for g such that ^(0) = 
{this-^y , out-^y} . This example shows how an abstract state is transformed 
by executing the code of f. The first two commands in f transform into I = 
{a-^bjC-^this}. Then, the denotation of g is plugged into the calling context, as 
follows: 

1. I is projected on {&, c}, obtaining Iq — 0; 

2. ^(0) is renamed such that this, y, and out are renamed to, respectively, b, 
c, and p, and /,„ — {b-^c, p^c} is obtained; 

3. a-^this is added to Ii since {b-^c £ I„i) /\ {{b»a) G /^) A {c-^this e /) is 
true; similarly, b-^this, a~^c and a-^p are also added to Ii; 
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4. 710 new statements have to he added because of I2 or ; 

5. li is empty since nothing becomes cyclic in g; 

6. finally, the denotation of return renames p to out in I U U IiU I4, and 
obtains {a-^b, c-^this, h-^c, out-^c, a^this, a-^c, b-^this, a--~^out}. 

Next, the inference of a denotation for a method m is shown, which uses the 
denotation |m''] of its code. Example 5.5 introduces the problems to be faced 
when trying to define a method denotation, and a solution is discussed below. 

Example 5.5. In Example 5.4, when analyzing b.g(c), the existence of a deno- 
tation ^ for g such that ^(0) = {this-^y, out-^y} was assumed. Intuitively, this 
^(0) could be computed using C^^g'^J, as follows: the first command in g adds 
this-^y, and the second one adds out-^y, which results in the desired abstract 
state {this~^y, out--^y} . After this result, one might think that C^|m''](/) is 
always the good way to compute ^(/), as just done. Yet, in general, this is not 
correct. For example, suppose the call b.g(c) is replaced by b.h(c) (which is de- 
fined in Figure 6 also). The effect of this call should be the same as b.g(c), since 
both methods make b reach c and b reach the return value. However, computing 
C^|h^J(0) has a different result: the first instruction adds this^y, but the sec- 
ond one removes it since the value of y is overwritten, and the third does not 
add anything. Therefore, |h''] (0) = 0, which is not sound to use as the result 

ofm- 

The problem in Example 5.5 comes from the call-by-value passing style for 
parameters, where, if the formal parameters are modified in the method, then 
the final abstract state does not describe the actual parameters anymore. This 
is why the expected reachability information is obtained for f (since it does not 
modify y), while it is not in the case of h (since y is modified in the body). 
A common solution to this problem is to mimic actual parameters by shallow 
variables or ghost variables, i.e., new auxiliary variables which are initialized 
when entering the method to the same values as the parameters, but are never 
modified in the body. 

Example 5.6. Consider methods h and k in Figure 6. Method k is the result of 
instrumenting h with a shallow variable u, mimicking y. It is easy to verify that 
C^|k^|(0) comes to be {this^u, out~^u}, which includes the desired reachability 
information. 

The following definition defines the abstract denotational semantics of a 
program P as the least fixpoint of an (abstract) transformer of interpretations. 
Variables u play the role of shallow variables. Note that shallow variables appear 
at the level of the semantics, rather than by transforming the program. 

Definition 5.7. The abstract denotational semantics of a program P is the Ifp 
of the transformer 

Tp{0 - {m ^ A/ e I';"^ {3X.qlm''j{I U I[w/u]))[u/w] \ meP } 
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where = {thiSjWi, . . . ,Wn}, and u is a variable set such that 

u n m'* = 0; moreover, dom(T) = m' U u, and X = dom(T)\(u U {this, out}). 

The definition is explained in the foUowing. The operator Tp transforms the 
interpretation C by assigning a new denotation for each method m G P, using 
those in C,. The new denotation for m maps a given input abstract state / G 
to an output state abstract from u{out}^ fohows: 

1. it obtains an abstract state Iq = lUllw/u] in which the parameters iv are 
cfoned into the shaUow variables u; 

2. it applies the denotation of the code of m on /o, obtaining h = (m''} (/q); 

3. all variables but u U {this, out} are eliminated from Ii (using 3X); and 

4. shallow variables u are finally renamed back to w. 

Soundness is addressed in Section 5.3, next we see some examples. 

Example 5.8. Consider the foUowing method 

lint mirror(Tree t) { 
2 Tree l,r; 

3 

4 if (t=null) then { 

5 return 0; 

6 } else { 

7 l:=t.left; 

8 r:=t. right; 

9 t.left:=r; 

10 t.right:=l; 

11 return l+mirror(l)+mirror(r); 

12 } 

13} 

and suppose that class Tree implements binary trees in the standard way, with 
fields left and right. The call mirror(t) exchanges the values of left and right of 
each node in t, and returns the number of nodes in the tree. An initial state 
is transformed by mirror as follows. Suppose that the current interpretation (. is 
such that C(mirror) = ^, and ^(0) = 0. The first branch of the if (when t is nuW) 
does not change the initial denotation; on the other hand, when t is different 
from null, line 7 adds t-^l; line 8 adds t-^r; line 9 adds again t-^r; and line 10 
adds again t-^l. Recursive calls mirror(l) and mirror(r) do not add any statement 
since ^(0) — 0. Finally, return adds nothing. Projecting {t-^l , t--~^r} on t and 
out results in 0, so that f(0) does not change, and there is no need for another 
iteration. It can be concluded that, as expected, mirroring the tree does not make 
it cyclic. 

Example 5.9. Consider the following method 
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1 Node connect() { 

2 Node curr; 

3 

4 curr=this; 

5 while (curr.next!=null) { 

6 curr:=curr.next; 

' } 

8 curr.next:=this; 

9 return curr; 

10} 

and assume it is defined in the class Node. A call l.connect() with I acyclic makes 
the last element of I point to \, so that it becomes cyclic. It also returns a 
reference to the last element in the list. An initial state is transformed by 
connect as follows. Line 4 does not add any statements, while line 6 in the loop 
adds this-^curr. Another iteration of the loop does not change anything, so that 
the loop is exited with {this^curr}. Since this is now reaching curr, line 8 adds 
{curr-^this, cur r-^ curr, this this}, anii {O''"'^'^, Finally, line 9 clones 
curr to out. In conclusion, the analysis correctly infers that l.connect() makes I 
and the return value cyclic. 

5.3. Soundness 

This section present the soundness theorem: the abstract state obtained by 
applying the abstract semantics to a method in a given input abstract state is 
a correct representation of (i.e., its concretization contains) the concrete state 
obtained by executing the method in any input concrete state which is correctly 
represented by such input abstract state. The proof of the theorem can be found 
in Appendix A. 4. 

Theorem 5.10 (Soundness). Let P be a program, and l and C, be, respec- 
tively, its concrete and abstract semantics according to Definitions 3.1 and 5.1. 
Moreover, let m 6e a method in P, and let S = t(m) and ^ = Ci'^)- H holds that, 
for all CTi G S„ji , 

(72 = S{ai) ^ {cTi[0Ut l-> (T2(0Ut)],(T2) G 7rc (C(arc ({^1 }))) 

5.4. Completeness and optimality 

Completeness [18] is a weU-known notion in Abstract Interpretation, and 
corresponds to require that no loss of precision is introduced by computing an 
abstract semantic function on abstract states with respect to approximating the 
same (concrete) computation on concrete states. An abstract domain A (with 
abstract function a and concretization function 7) and an abstract function 

over it are backward- complete for the concrete function / if and only if, for 
every concrete input cr, the abstraction a(f{a)) of a concrete computation is 
equal to the abstract computation fi^{a{a')). This property guarantees that 
aiMf)) = Ifpif*). 
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By optimality we refer to the fact that the abstract function under study 
is the best correct approximation of the concrete function with respect to the 
associated abstraction: for every /, must be equal to a{f{-f{I))). 

For the sake of the following discussion, the abstract semantics |_] (a sim- 
ilar discussion holds for is supposed to use, for collecting sharing, aliasing 
and purity information, the best correct approximation |_] of |_] with re- 
spect to IJ: for every command com and abstract value / £ IJ, 5^ | com] (J) is 
defined as Q;s(C^|com](7s(/))). To introduce the abstract semantics over this 
domain is necessary in order to be able to properly talk about completeness and 
optimality of the reachability and cyclicity analysis, as it will be clear in the 
following. 

Backward completeness. The present analysis is not backward-complete. In 
the following, the abstract domain under study will be n XJ (i.e., sharing, 
aliasing and purity are included). Consider the state a obtained by executing 
the following statements, starting from a heap where all variables are null: the 
final result of the execution is the heap shown in the picture. 
1 y:=new C; 



After this code fragment, y and z share because they reach a common location, 
and y is reaching z. Then, the most precise approximation of the resulting 
concrete state cr is / = {{yy), (zuz), (yz), {y-y), {z-z),y^z}'^. Suppose that 
the statement 



is executed afterward, giving the concrete state a': in this case, the concrete 
function / under study is the semantic of this statement, namely, C^|x:=y.f], 
and the state a' corresponds to Jx:=y.f] (cr). Now, the abstraction of a' with 
respect to I^^ n IJ is 

/' = {{x»x), {yy), {z»z), {x»y), {x»z), (yz), (x-x), (yy), {z-z),y^x, y-^z, z-wx} 

which correctly represents the sharing between the three variables, and the fact 
that X points exactly to the location which is reached by both y and z. On the 
other hand, computing the result of the abstract semantics (i-C, the present 
analysis C^|x:=y.f]) on the input abstract state / gives the state 

I" = { {x'x) , {yy) , , {x»y) ,{x»z), {yz) , 

{x-x), {yy), {z-z), {x-y), {x-z), {yz),y-^x, y^z, z-^x, x-^z} 

The reachability statement x-^z is added because the analysis admits that, since 
y is said to reach z, the location pointed to by x could be exactly on the path 



*Tho notation (-•_) and (_■_) is used in the beginning of Section 5 



2 z:=new C; 

3 y.f:=new C; 

4 z.f:=y.f; 

5 y-g:=z; 




6 x:=y.f; 
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from y to z. Because of the difference between /' and /", this counterexample 
is enough to prove the lack of backward completeness. 

Optimality. This section argues that two important abstract state transformers 
included in the abstract semantics are optimal. The considered transformers are 
ff = £'^\v.f\ for field access, which is optimal with respect to /i = 

and /* — CJ-\v.f-—p\ for field updates, which is optimal with respect to /2 = 
C'T.\v.f:=p\. The use of p means that the state transformers account for the 
field update after the expression exp has been evaluated. In other words, /2 
will be applied to the concrete state resulting from evaluating exp, and ff will 
be applied to the abstract state Ig described in Figure 5. In order to avoid 
confusion with names, let J be the abstract value which is given as input to the 
abstract state transformer, and let Ji the corresponding output; therefore, J' 
and similar names will play the same role as /' and similar names in Figure 5. 

Again, the abstract domain includes sharing, aliasing and purity, so that the 
concretization and abstraction functions 7 and a are the ones which are induced 
by the reduced product I^^nlJ in the standard way. This means that optimality 
is proven under the assumption that the abstract operators of sharing, aliasing 
and purity are also optimal. It is assumed that an abstract value contains 
sharing, aliasing and purity information, together with reachability and cyclicity, 
and that it will be clear from the context how to refer to each part. 

By soundness, the non-strict inequalities E'^\v.f\{J) 3 Q;(i?^|w./](7(J))) 
and C'^\v.f-—pl{J) ^ a{C!^\v.f-—p\{^{J))) already hold, where set inclusion 
is the partial order on n IJ. Therefore, to prove this claim amounts to 
demonstrate the other direction of the inclusion, i.e., that, for every reachability 
or cyclicity statement st contained in J, there is a concrete state a G 7( J) such 
that (71 = C^|w./:=p](cr) (the case of E'^\v.f\{a) is similar) is a concrete state 
whose abstraction a{{ai\) contains st. In other words, ai is a state where the 
may-information represented by st is actually happening (for example, if st is 
some v~^w, then there must actually be a path in the heap from to u; in 
(Ti), so that the abstraction of ui will forcefully contain such a statement. In 
the proof, this idea of "a statement st actually happening in a state cr" will be 
phrased as "cr justifies st" . 

• Case ff: the output abstract state Ji is basically the union of four sets: 
(a) J; (b) J[v/p]; (c) {w-^p \ (w«u)eJ,}; and (d) {p-^p \ £ J}. For 
every one of them it is necessary to prove that, for every statement st 
contained in it, there exists at least one concrete input state a such that 
the corresponding output state cti = /i(cr) justifies st. 

(a) Clearly, every statement st which was already in J, and is therefore 
maintained in Ji, is justified by the fact that the structure of the heap 
does not change when evaluating the expression: by hypothesis, there 
was already a state tr justifying st, and the corresponding output ui 
still justifies such statement. 
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(b) In this case, relevant statements in J can be of four kinds (other 
statements which do not involve v are not relevant), and we need 
to prove that the corresponding statements in J[v/p] (where v is 
replaced by p) are justified. 

v-^w : In this case, there certainly exists a in the concretization of 
J such that V actually reaches w in at least two steps, and the 
first step goes through /; then, the location pointed to by the 
expression actually reaches w (in fact, it is on the path from v to 
w), so that CTi justifies the statement p-^w contained in J[v/p], 
corresponding to v-^w; 

w-^v : This case is easy since there exists a such that w actually 
reaches v, and it is straightforward to see that p will be actu- 
ally reached by w in ai (transitivity of reachability at the con- 
crete level), thus justifying the corresponding statement w-^p in 
J[v/p]- 

v-^v : This case is also easy because there certainly exists a such 
that V is cyclic, and the first step of the cycle when starting 
from V goes through /; this means that v.f is still in the cycle, 
and the location pointed to by the expression reaches itself, thus 
justifying the corresponding statement p--~*p in J[v/p]; 
O'" : This case is similar to the previous one. 

(c) In this case, every w-^p must be justified, provided there is sharing 
(this is a case where it becomes clear that sharing must also be con- 
sidered) between v and w in the input state. It is enough to take 
the same (up to variable renaming) concrete state used in the dis- 
cussion about backward completeness, where v and w both reach (in 
one step, and through /) the same location in the heap: the location 
pointed to by p in the output state comes to be actually reached by 
w, thus justifying the statement. 

(d) The last case is easy because it is enough to find some a where v is 
cyclic (but not necessarily reaching itself), and the location pointed 
to by v.f reaches itself. 

• Case f^: The first issue here is to note that optimality requires the single- 
field optimization discussed in Section 5.2.3, where Jo is strictly smaller 
than Jq whenever it can be guaranteed that all the relevant reachability 
or cyclicity paths have been broken by updating v.f. In fact, consider 
the case where this optimization is not performed (i.e., J' = Jq). The 
following piece of code 

1 X := new C(); 

2 x.f := x; 

3 X.f := null; 



shows the lack of optimality under the condition that f is the only field 
of C. In fact, let the abstract value J before line 3 be {x-^x, O^} as it 
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would be obtained by the analysis, so that "f{J) contains all the states 
where x is cyclic and reaches itself. However, the abstract semantics with- 
out the optimization would generate the same abstract value {x-wx, O'*'} 
as the final value. This is not optimal since any concrete state after ex- 
ecuting this code would have x acyclic and not self- reaching, so that its 
abstraction would be {} (in other words, none of the statements would be 
justified). On the other hand, the aforementioned optimization removes 
these statements from Jq, so that J' is empty, thus achieving, in the end, 
optimality. 

In the definition depicted in Figure 5, the output abstract state Ji consists 
of two more parts: (a) the one coming from = {wi~^W2 \ {{{wi-v)^J'g)\/ 
{wi~^v G J')) A {{{p-W2)£J's) V {p-^W2 G J'))}'i and (b) the one coming 
from = {O"" I {{p~-^v e J') V ((p-w)eJ^) V G J')) A {{{wv)eJ',) V 

{w-^v e J'))}. 

(a) In order to justify a statement wi-~^W2, it is enough to take a concrete 
state a G Js{J) (which clearly exists) where wi is actually reaching 
and the location pointed to by the result of the expression is actually 
reaching W2. In this case, the field update will create a path from wi 
to W2 in (Ti, so that the statement is justified. 

(b) A statement can be easily justified by taking cr such that the 
result of the expression points to an actually cyclic data structure, 
and w actually reaches v. Then, the newly created path will make w 
cyclic. 

The final elimination of p is not problematic. 

5.5. Note on an implementation 

The present analysis has been implemented in the COSTA [2] CO St and 
Termination Analyzer. The implementation works as a component of COSTA, 
and handles programs written in full sequential Java bytecode, which includes 
control fiow that originates from the handling of exceptions. Static fields are 
accounted for as a kind of global variables: this means that, for every class k and 
static field /, a global variable v^.f is added to the analysis (note that the set 
of such global variables is statically decidable by simply inspecting the program 
code). The acyclicity information is used by COSTA to prove the termination 
or infer the resource usage of programs. 

It is worth mentioning that the implementation is a prototype, and that 
it can be optimized in many ways. In fact, the present paper focuses on the 
theoretical definition of an existing analysis, so that the implementation is not 
the most important issue. As a matter of fact, such implementation deals with 
a different language with respect to the original implementation; this implies, 
for example, having to account in a specific way for advanced features of Java 
and Java bytecode like objects, exceptions, and static fields. The single-field 
optimization discussed in Section 5.2.3 is not implemented. 
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6. Conclusions 



This paper discusses an acyclicity analysis of a Java-like language with mu- 
table data structures, based on reachability between variables. In particular, the 
main focus of the paper is on the formalization of an existing analysis within 
the framework of Abstract Interpretation. The proposed acyclicity analysis is 
based on the observation that a field update x.f=y might create a new cycle iff 
y reaches x or aliases with it before the command. Two abstract domains are 
first defined, which capture the may-reach and may-be- cyclic properties. Then, 
an abstract semantics which works on their reduced product is introduced: it 
uses reachability information to improve the detection of cyclicity, and cyclicity 
to improve the tracking of reachability. 

The analysis is proven to be sound; i.e., no cyclic data structure are ever 
considered acyclic. It is also proven to be the best correct approximation of 
the concrete semantics with respect to the chosen abstraction. Moreover, it 
can be shown to obtain precise results in a number of non-trivial scenarios, 
where the sharing-based approach is less precise [26]. Indeed, since the existence 
of a directed path between the locations bound to two variables implies that 
such variables share, the proposed reachability-based analysis will never be less 
precise than the sharing-based approach. In particular, it is worth noticing that 
the reachability-based approach can often deal with directed acyclic graphs, 
whereas sharing-based techniques will consider, in general, any DAG as cyclic. 
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Appendix A. Proofs 



This appendix includes proofs for: Lemma 4.3 in Appendix A.l; Lemma 4.6 
in Appendix A. 2; Lemma 4.7 in Appendix A. 3; and Theorem 5.10 in Appendix A. 4. 

Appendix A.l. Proof of Lemma 4.3 

Due to the definition of Galois insertion, the result to prove amounts to say 
that both 

(a) V4 ex;. al{j;{Ir)) ^ Ir 
and (b) ^h^I^. Yr{al{h))^k 

hold, where C is the ordering on . 

Part (a). We show that v^w ^ Ir <^ v-^w E {a'!^ (If,)) . (^) assume v-^w G 
Ir ; then, according to the definition of and class reachability, there must be 
a concrete state cr G St in which v reaches w, since, otherwise, the statement 
v-^w cannot be part of the domain I^. We construct a state a' from a by 
setting all reference variables but v and w to null. By the definition of 7^, this 
specific a' must be in j^{Ir). This, according to the definition of a^,, implies 
that v~^w G al{'-f^{Ir)). (<^) assume v-^w G al{'-f^{Ir)). According to the 
definition of a^, this means that there exists at least one cr G "f^{Ir) in which 
V reaches w, and, according to the definition of 7^, this can only happen if 
v-^w G Ir- 

Part (h). We show that cr G 4 cr G 7^(0^ (/[,)). Let cr G 4, and let Ir be the 
set of all reachability relations in cr, i.e., v reaches w in cr iff G Ir. Clearly, 

Ir C a^(/t,). Then, according to the definition of 7^, cr must be in 7;(a^(/^)) 
since it satisfies Mv, w G t. v reaches w in a ^ v--^w G ct^ (/[;). □ 

Appendix A. 2. Proof of Lemma 4.6 

Very similar to the proof of Lemma 4.3. □ 

Appendix A. 3. Proof of Lemma 4^.7 

(^). We show that: 



F G H 

First, note that the logical formula F ^ {G A H) is equivalent to (-iG ^ 
-iF) A {-^H ^ ~^F). The proof is by contradiction, and consists of two parts: 

1. proving that I^ ^ I^ implies ^{{l},ll)) ^ "/{{Ir , 1^))] and 

2. proving that (/^ \ lO^^^H) 7^ {I? \ {v^vlO^il^}) implies 

l{{l'rJ'c))^l{{lll?))- 
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The proof goes as follows. 

1. Suppose /I ^ II, and let Xi = {v \ & I] \ I^), and X2 = | O" e 

\ /g}. Note that, by hypothesis, at least one of X\ and X2 must be 
non-empty. For i G {1,2}, let cji be a state where 

(a) Every w G is cyclic, but does not reach itself, and no other variable 
is cyclic; and 

(b) No variables reach any variables, i.e., Q!^({(Ti}) = 0. Note that this 
requirement is consistent, since the cyclicity of some variables (in 
this case, those in Xi) does not necessarily imply the existence of a 
reachability path between variables. 

It is easy to sec that (Ti and (T2 both belong to ^r{ll) ^lr{Ir)^ since they 
do not include any reachability statement; therefore, if Xi ^ 0, then <ti 
belongs to but not to since {1^,11) does not allow 

the cyclicity on variables from Xi. Dually, if X2 ^ 0, then (T2 belongs to 
7((/2,/2)) but not to7((/i,/i)). 

2. Suppose Ri = ll\ {v-^v \ 0"^/^} is different from R2 ^ \ {v-^v \ 
0'"^lc}, and let Si = Ri\ R2 and ^2 = i?2 \ Note that at least one 
between and ^2 is non-empty. If is not empty, then let p G be 
one of the statements which in i?i but not in i?2. A state cti can be chosen 
such that 

(a) If p = v-^v, then v is the only cyclic variable in ai (note that the 
cyclicity of v must be allowed by since, otherwise, p would not be 
included in i?i and thus not in Si too); and 

(b) If p = v-^w, with V ^ w; then, v must reach w in (Ji , and no other 
variable reaches any other variable. Also, no variables can be cyclic. 

Clearly, in both cases above such state belongs to 7((/^, )), but it cannot 
be in j{{Ir,Ic)) because: in (a), either v^v ^ (so that ai ^ 7^(/^)), 
or ^ Ic (so that thus cti ^ 7c(^c)); and, in (b), v-^w ^ , so that 
(Ji ^ 7^(/^). Dually, if S2 is empty, then Si cannot be empty, and, with 
a similar reasoning, a state a2 can be found which belongs to ^{{1^1 Ic))' 
but not to 

('^). We prove that: 

il=PcHi^{v-^v\o^^il})Hi'r\{v--v\o''^in)^^:M 

It follows easily from observing that, under the hypothesis of the above implica- 
tion, the only difference between and (/^, I^) is that may contain 
some statements v-^v for variables v such that C ^ -^i, and (/^, I^) may contain 
some (different) statements v-^v for variables v such that or (3" ^ I^- How- 
ever, adding such statements to both abstract values does not change the set of 
concrete states they represent, since the possibility that v reaches itself in any 
concrete state is contradicted by the lack of the O" statement. In other words, 
there is no concrete state which belongs either to 7^2 ( (/^, J^)) or ^^^{{1^,1^)), 

butnotto7?c((^A{^^-^|0"^/c'}, I^)) a.ndYrc{{Ir\{v--v\0-^in, Ic)) (which 
are equal by the hypothesis.) □ 
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Appendix A. 4- Proof of Thoerem 5.10 

This proof of soundness amounts to proving the soundness of aU abstract 
denotations for expressions and commands, assuming that a current interpre- 
tation L and a corresponding abstract one C which correctly approximates l are 
available. Then, a simple induction can be applied to show that the abstract 
semantics of Definition 3.1 correctly approximates the concrete semantics of Def- 
inition 5.7 (the induction step basically applies the denotations on the elements 
of L and C). 

In the following, let cr be a concrete state, com be a command, exp be an 
expression, and a* be the state obtained by executing com or evaluating exp 
in tr. The soundness of the abstract denotations for expressions and commands 
amounts to say that, if / G Z^^ correctly approximates a, i.e., a e llc{I)i 
then the abstract state /* — C^|com](/) (or /* = £^|ea;p](/), in the case of 
expressions) correctly approximates a* . Formally, we show that 

1. VfT e E,,/ e i;,. a e iD^'^'^il) ^ Ei^lexpHa) G Yrc^'Hs^lexpUl)) 

Note that, if a* is obtained after evaluating an expression, then p G dom(cr*), 
while, if it is obtained after executing a command, then dom(cr*) = dom((j). 

The soundness proof considers separately the rules of the abstract semantics 
£^ !_](_) and !_](_). When some logical fact is said to hold by soundness, it 
means that it holds by the hypothesis on the input (i.e., that a G 7rc(-f) holds), 
or by induction on sub-expressions or sub-commands. For example, the fact 
that V reaches w in cr implies v~^w ^ I by soundness, since / is supposed to be 
a sound description of a. 

Denotations (Ig), (2e), and (3e). Suppose a* ^ ^lc^''^{I*)- Then, according 
to the definition of ^Vc^^^ , it must be the case that (i) wi reaches W2 in cr* 
but wi^W2 ^ /*; or (ii) w is cychc in cr* but O"" ^ /*. This contradicts the 
soundness hypothesis a G 7^^,(1), since I* — I and cr and cr* have the same 
reachability and cyclicity information^. 

Denotation (4e). Assume t(v) ^ int, otherwise the reasoning we developed 
for case (le) applies. Note that this case does not have any side efi^ects, except 
defining the new variable p. If cr* ^ ^Vc^''^ {!*) , then, according to the definition 
of 'yVc^''^ ^ it must be the case that (i) wi reaches W2 in cr* but wi'^W2 ^ I*] or 
(u) w is cyclic in a* but O'" /*. Suppose we are in case (i): 

• If wi p A W2 p, then (7(w2) = ct* (^2) G R{a* (wi), a*) = R{d-{wi), a), 
i.e., wi reaches W2 in a. By the soundness hypothesis a G ^rd^) have 
Wi-^W2 € I ^ I* , which contradicts Wi-^W2 ^ 



^Note, that, unlike in Java, the simple act of creating an object does not involve, in itself, 
any action on its content, i.e., there are no side effects due to the constructor. 
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• li wi = p A W2 p, then a{w2) = <y*{w2) E R{<7*{p),a*) = R{a{v),d-), 
i.e., V reaches W2 in a. By the soundness hypothesis a G we have 
v^'W2 S / and thus p~~+W2 S I[v/p] Q I*, which contradicts p~^W2 ^ I*- 

• li Wi ^ p A W2 = p, then a{v) = a*{p) G R{a*{wi),a*) = R{a{wi),a), 
i.e., Wi reaches v in a. By the soundness hypothesis a G ^^^{I), we have 
Wi-^v G I and thus Wi-^p € /[v/p] C /*, which contradicts wi^p ^ J*. 

• If wi = pA«;2 = P, then a{v) = a*{p) G R{(j*{p),a*) = R{(j{v),&), i.e., w 
reaches v in a. By the soundness hypothesis a G j^^{T), we have u^w G / 
and thus p-^p G -/^[w/p] C /*, which contradicts p-^p ^ /*. 

For case (ii), the reasoning is basically as (i), by considering cyclicity instead of 
reachability. 

Denotation (5e). Assume / is of reference type, otherwise the reasoning we 

have done for case (Ig) appUes. Note that this case does not have any side 
effects, except defining the new variable p. If a* ^ "frc^''\l*), then, according 
to the definition of 'jrc^''^, it must be the case that (i) wi reaches 102 in a* but 
wi~^W2 ^ I*; or (ii) w is cyclic in a* but O"" ^ /*• Suppose we are in case (i): 

• If wi p AW2 p, then (7(^2) = ^*iw2) G R{a*{wi),a*) = R{a{wi),a), 
i.e., wi reaches u>2 in <t. By the soundness hypothesis a G 7j^g(7), we have 
Wi'^W2 G / C /*, which contradicts ?«i~~^W2 ^ I* ■ 

• \{ w-i = p AW2 ^ p, then a{w2) = <J*(w2) G R{a*{p),a*) C R{a{v),a), 
i.e., V reaches W2 in o. By the soundness hypothesis a G we have 
v^W2 G / and thus p~~+W2 G C /*, which contradicts p^W2 

• If wi ^ pA w2 = p, then (T*(p) G R{a{wi),a*) = R{a{wi),a), wo also 
have (J*{p) G R{a{v), a)) (since p = v.f), i.e., wi shares with v in a. Thus, 
wi-^p G {w~~^p I {wuv) £ Is} C /*, which contradicts wi-^p ^ I*. 

• If wi = p A W2 = p, then (T*(p) G R{a*{p),a)), which means that v is 
cyclic in a, and by the soundness hypothesis we have 0" G /, and thus 
p-^p G {p^-^pIC G /} C /*, which contradicts Wi-^p ^ I*. 

For case (ii), the reasoning is basically as (i), by considering cyclicity instead of 
reachability. 

Denotation (6e). The proof for this case is by structural induction on ex- 
pressions, where the base-case include the non-compound expressions of cases 
(le)-(5e) and (7e), for which we have seen already (case (7e) is done below) 
that the abstract denotations correctly approximate the concrete ones. Let 

11 = £'^lexpil{I) and di — E'^lexpiKa). By the (structural) induction hy- 
pothesis, wc have cri G Jrc^''\li)- Moreover, since the state ((T,cti) is basically 
obtained by removing p from (Ti, we also have (a,ai) G "fl^{3p.Ii). Now, let 

12 = f J|ea;p2](3p.Ji), and 02 = -E^|ea;p2](((j, cti)); then, by the (structural) 
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induction hypothesis, we have a-i G "Jrc (h)- Since a* is obtained from a2 by 
setting p to a number (i.e., there is no reachability or cychcity relations in a* 
that involve p), and since I* = 3p./2, we can conclude that a* G 'yrc^''\l*)- 

Denotation (7e). Calling a method m consists of an abstract execution of its 
body on the actual parameters, followed by the propagation of the effects of 
m to the calling context (i.e., the input abstract state /). First, note that, in 
the abstract semantics, reachability and cyclicity statements are only removed 
when a variable is assigned. Due to the use of shallow variables for the param- 
eters, statements about the formal parameters of m are never removed during 
an abstract execution of its body. Therefore, if, during the execution of m, the 
variable v reaches w, then, at the end of the method, v will be said to possi- 
bly reach w, even if this reachability is destroyed at some subsequent program 
point. This is similar to the way sharing information is dealt with in the present 
approach (following [28, 14]). 

Keeping track of cyclicity is rather easy. In addition to keeping all cyclicity 
which is in /, a safe approximation is taken, which states that, if an argument 
V might become cyclic during the execution of m, then anything that shares 
with it before the execution might also become cyclic. This is accounted for in 
the definition of I4, and is clearly safe. In fact, variables of the calling method 
which are not arguments of the call, and do not share with any argument Vi, 
cannot be affected by the execution of m. 

The treatment of reachability is more complicated: in addition to / and Im 
(which is introduced by the method for v), it is necessary to take into account 
the effect of the method call on variables which are not arguments. This is done 
in the definition of /i, I2, and I3, which model the effects of m on variables which 
share with its actual arguments. Consider two arguments Vi and Vj (where i 
can be equal to j): a path between two variables wi and W2 (which can be 
arguments, or non-argument variables) can be created by m if (i) Vi and wi 
share before the call, Vj and W2 alias before the call, Vi is modified in m, and Vi 
reaches Vj after the call; or (ii) Vi and wi share before the call, Vj reaches W2 
before the call, Vi is modified in m, and Vi and Vj share (without reaching each 
other) after the call. The two cases are accounted for in the definition of, resp., 
Ii and I2, and are depicted in Fig. A. 7. In both cases, the creation of the path 
requires that an argument is modified in m (condition Vi G sh'), and that Vi 
and Vj do not point to disjoint regions of the heap (i.e., either Vi reaches vj, or 
they simply share). As a result, if these conditions are met, then the statement 
wi-^W2 is added. It can be seen that this accounts for all cases where some 
change in the arguments of m affects the reachability between non-argument 
variables. 

Finally, considers all variables v aliasing with the return value at the end of 
m (note that these are the only new aliasing statements involving arguments 
which can be created in the body of m) : the information about them is cloned 
for p. 
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Wi Vi Vj W2 Wi Vi Vj W2 




h h 

Figure A. 7: Scenarios wliere a path from w\ to W2 can be created inside m. Dashed arrows 
represent reax;hability: they connect a variable to a reachable location (represented as a circle). 
Solid arrows connect a variable u to the location a{u) directly bound to it. Arrows labeled 
with * are paths which are created inside m (strictly speaking, they could also exist before 
the call), while the others existed before the method call. In both cases, it can be seen that 
a reachability path from wi to W2 is created, which contains a sub-path created inside m by 
modifying its arguments. 

Denotation (Ic). Suppose a* ^ ^Vc^'^^ {!*)■ Then, according to the definition 
of 'ylc^''^ , it must be the case that (i) Wi reaches W2 in a* but w\'-^W2 ^ /*; 
or (ii) w is cycUc in a* but O*" ^ I*- Suppose we are in case (i), and let 
C7e = E!;.lexp\{cj) and h = £^lexp}{I). 

• If Wi ^ V A W2 ^ V, then it must be the case that Wi reaches W2 in 
CTe. By the soundness of the expressions denotations we must have cTg G 
Jrc'^'^\li), which means that wi-^W2 S h', thus, wi-^W2 G {3v.Ii)[p/v] = 
I*, which contradicts Wi~^W2 ^ I*- 

• If = V A W2 V, then it must be the case that p reaches W2 in ae- 

By the soundness of the denotations for expressions, wc must have CTg G 
Jrc^''^ (h), which means that /0-wzi;2 G Ii, and thus i;~->W2 G (^i'.Ii)[p/v] = 
I* , which contradicts V'^W2 ^ /* • 

• If Wi ^ V A W2 = V, then it must be the case that wi reaches p in ae- 
By the soundness of the denotations for expressions, we must have tXe G 
Jrc^''^ ih), which means that wi-^p G Ii, and thus Wi-^v G {3v.Ii)[p/v] = 
/*, which contradicts wi-^v /*. 

• If Wi = V A W2 = V, then it must be the case that p reaches p in ag- 
By the soundness of the denotations for expressions, we must have ag G 
Jrc^''^{Ii), which means that p-^p G Ii, and thus v-^v G {3v.Ii)[p/v] = 
I* , which contradicts v-^v ^ I* . 

Case (ii) can be done with similar reasoning. 

Denotation (2c). This case is trivial when / has type int, since only side effects 
during the evaluation of exp have to be taken into accoTint. If / has reference 
type, then this command is equivalent to first evaluating exp, and then executing 
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v.f :— p. Let a' — E!^lexp}{a), and £e — f^'ip)- If v and £e are considered, then 
there are two main cases (Fig. A. 8): (a) &{v) ~ i^', or (b) a{v) ^ ie- 



(a) In this case, a cycle on v is created, whose length is 1. If another variable 
u (possibly, v itself) sharing with v in a* is considered, then there are 
several possible scenarios in the heap, and soundness has to be proven for 
each of them. 

— u aliases with v or reaches v (cases ui and U2 in the left-hand side 
of Fig. A. 8). In this case, u reaches v via /, and this is taken into 
account in the definition of Ir, where u plays the role of wi, and 
V also plays the role of W2- The result is that Ir includes u^v, as 
expected. The semantics correctly adds v-^v as well (in fact, v can 
play the role of both wi and ^2)- As for cyclicity, the definition of 
Ic guarantees that and O" will belong to /*. 

— V reaches u (case U3 in the same figure). In this case, 6 /* 
since, in the definition of J^, u plays the role of W2 (note that v and 
p alias). V will also be considered as cyclic by the definition of Ic] 

— V and u both reach a common location i (case U4). If none of the 
previous cases happens, then v and u do not reach each other, so that 
/* does not need to contain reachability statements between them. 
In general, only v will be considered as cyclic in this case (in the same 
way as the previous cases). 

(b) In this case, when considering u, the number of possible scenarios for 
reachability is larger. Moreover, there are two scenarios where v would be 
cyclic after the update (i) ie reaches v, so that a cycle is created by the 
field update, and v becomes cyclic (if it was not already); or (ii) does 
not reach u, so that v is cyclic only if it was already cyclic in cr, and the 
same applies to le- In case (ii), it can be easily seen that the definition of 
Ic accounts for the cyclicity of v since 0" belongs to / by soundness and 
will not be removed. Case (i) will be discussed in the following, for each 
scenario. 

— u reaches v or aliases with it (cases ui and U2 in the right-hand side 
of Fig. A. 8). In this case, it was also reaching v (or aliasing with it) 
in a' , so that (in the case of reachability) w^v S J', which implies 
w^^v S /", as soundness requires. As for cyclicity, in case (i), the 
cyclicity of u is detected because it reaches v. 

— Cases U3, U4, and U5. These cases are easy, because nothing changes 
with respect to the reachability between u and v, and all the state- 
ments were already contained in /. 

— u points to £e or is reached by it (cases uq and uy). In this case, u 
plays the role of W2 in the definition of Ir, and is correctly considered 
to be reached hy v. As for cyclicity, u will only become cyclic in case 
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U2 




Figure A. 8: The possible scenarios for case (2c): (a) £e and (y(v) coincide (left); and (b) they 
do not coincide. Variables Ui represent the possible relations between the variable u used in 
the proof and the data structure modified by the field update. Double solid arrows stand for 
field dereferencing, and are labeled with the name of the field. For the other kinds of arrows, 
see Fig. A. 7. 



(i) if it points to £e, or belongs to the cyclic path. In both cases, the 
semantics accounts for it since u would reach u, thus being considered 
as cyclic (definition of /c). 

— w and £e reach some common location H. (case itg). Also easy since 
nothing changes with respect to the reachability between u and v. 

Note that, due to the discussion in Section 5.2.3, the single-field optimization in- 
troduced by condRemove is not problematic for soundness, since the removal of 
statements is only applied if the required conditions about v and / are guaran- 
teed to hold. In any case, the conservative choice of taking condRemove(lQ, v, f) 
to be Iq itself is also sound. 

Denotation (3c). This case is quite straightforward, given the inductive hypoth- 
esis on conii and com2, and the assumption that exp has no side effects and 
returns an int. Suppose a* = C.|:|comi]((T) for i e {1, 2}, then, by the induction 
hypothesis, a* e YrdCllcom^lil)) C Yrciqicom4iI))UYrc{qicomsm) = I*. 

Denotations (4^), (5c), and (6c). Rules for loops and concatenation are easy, 
given the inductive hypothesis on the sub-commands, and the definition of the 
fixpoint. The rule for the return command is also easy, being basically similar 
to variable assignment. 

Having proven that all abstract denotations are sound with respect to the 
concrete denotational semantics, together with Definition 5.7 and the definition 
of a denotational semantics, proves the theorem. □ 
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